-
Notifications
You must be signed in to change notification settings - Fork 273
[PT2E] Fix per-tensor observer issue with varying shape & rank #2177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2177
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit b03b1e6 with merge base 07ca637 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
87f1249
to
2ac41fb
Compare
@@ -1891,6 +1891,10 @@ def convert(self, model: torch.fx.GraphModule, observer_node: Node): | |||
assert self.original_dtype is not None, ( | |||
"Expecting original_dtype to be populated" | |||
) | |||
# Since input shape & rank may change (e.g. Resnet18), here we need to update block_size for each input | |||
self.block_size = get_block_size( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when does this happen? can you give an example? I thought using -1
for dynamic dimensions will be enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To reproduce the issue, you may run the code here: #2094 (comment)
You will have to using -1
for block_size
without updating of self.block_size
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are you saying the rank / number of dimension changes for input as well? can we use a single -1
to represent this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are you saying the rank / number of dimension changes for input as well?
Yes
can we use a single -1 to represent this case?
I think it's doable. But there are checks to guard len(self.block_size) == len(input.shape)
. We need to handle the special case for per-tensor quant at these locations. Is it ok?
@jerryzh168 Could you please review this PR? Thanks. |
Hi @jerryzh168 @drisspg Could you please review this PR? I am not sure if the current implementation is what you expected. Thanks. |
@@ -113,7 +113,8 @@ def _get_reduction_params(block_size, input_size): | |||
shape_for_reduction: (3, 3, 5, 2, 10) | |||
reduction_dim: [0, 1, 3, 4] | |||
""" | |||
assert len(block_size) == len(input_size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this still used? we should be using the code in quant_primitives.py I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. It's still used when running the prepared model (model after prepare_pt2e
). Is it a bug? Do I need to fix it, too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am using the observers defined here: torchao/quantization/pt2e/_affine_quantization.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @jerryzh168 May I know your suggestion on this? Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should be using the ones in torchao/quantization/observer.py eventually
only occurrence seems to be
ao/test/quantization/pt2e/test_quantize_pt2e.py
Line 2531 in 554cb60
AffineQuantizedMinMaxObserver, |
so if you are adding new things I'd recommend use the one from torchao.quantization
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Xia-Weiwen sorry for the delay, please feel free to work on this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we already use the one from torchao:
ao/torchao/quantization/pt2e/prepare.py
Line 35 in 212d912
PartialWrapper, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jerryzh168 Thanks for the reply. I did not mean torch.ao
. I meant there are two versions of such utilities in torchao, torchao.quantization.pt2e
and torchao.quantization
. For example,
ao/torchao/quantization/pt2e/observer.py
Line 84 in 5153bd3
class PartialWrapper: |
and
ao/torchao/quantization/observer.py
Line 32 in 5153bd3
class _PartialWrapper: |
The PT2E flow in torchao uses those in
torchao.quantization.pt2e
while you said you wanted to switch to torchao/quantization/observer.py
.So, I was asking whether you would switch to
torchao/quantization/observer.py
in PT2E flow first. Do you have any suggestions on that? Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh I see, yeah for now use torchao/quantization/observer.py
would be better I think, we haven't finalized the folder structure for this one yet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jerryzh168 Am I supposed to wait until you finalize the folder structure? Thanks.
would be good to add a test for this |
Fixes #2094 and #2112
We may find inputs with varying shapes and ranks, e.g. when running Resnet18. The current implementation is based on block_size, which is not enough for such cases. The fix is simple: use block_size = -1 for each dimension for per-tensor quantization and update block_size for each input when inserting q/dq in
convert
.